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METHOD FOR AUTOMATICALLY efifon thai was previously required lo place a document of 

PROVroiNG A COMPRESSED RENDITION ibis type on the World Wide Web. 

OF A VIDEO PRO GRAM IN A FORMAT The inventive method provides a compressed rendition of 

SUITABLE FOR EL ECTR ONIC SEARCHING a video program in a format suitable for electronic searching 

AND RETTRIEVAL 5 and retrieval. An electronic pictorial transcript representa- 

lion of the video program is initially received. The video 

TECHNICAL FIELD program has a video component and a second information- 

This invention relates gcncraUy to a method for automati- t>«aring media component associated therewith. The picto- 

callv providing a compressed rendition of a video program „ transcript representation includes a representative frame 

in a format suitable for electronic searching and retrieval. ^^^^ segment of the video component of the video 

and more particularly to a method for providing a com- P^gram and a portion of the second media component 

pressed rendition of a video program in a format suitable for associated with the segment. The electronic pictorial tran- 

electronic searching and retrieval on the World Wide Web. ^ transformed into a hypertext fomiat to form a 

hypertext pictorial transcript. The hypertext pictorial tran- 

BACKGROUND script is subsequently recorded in an electronic medium. 

The rapid growth of the World Wide Web began with the BRIEF DESCRIPTION OF THE DRAWINGS 
development of an on-line browser having a graphical user 

interface. Graphical interfaces provide a number of impor- ^ ^ example of one page of a printed pictorial 

tant advantages, including the ability lo rapidly scroll 20 iranscnpt generated from a television news program in 

through a document to gel to a particular point of interest. accordance with method of the present invention. 

Moreover, the ability to interact with a medium other than FIG. 2 illustrates the use of server push for viewing an 

text (i.e. images or audio) increases the rate at which HTML pictorial transcript. 

information can be conveyed since an image often conveys FIG. 3 shows an example of a page format that may be 

an idea faster and more efficiently than text. 25 employed when performing keyword searching. 

While graphical browsers provide an adequate interface FIG. 4 shows an example of an index that may be 

for text and images, they provide an inadequate interface for generated for HTML pictorial transcripts, 
ifl video programs. The sequential nattu-e of the video and 

['f: audio components of a video program impedes rapid access DETAILED DESCRIPTION 

to such programs on the World Wide Web by graphical ^ * .u j r . n , - ^ 

1=^ browser. Fulthemore. because of the United ban^dth of . A method for automaUcaUy compre^^^^^ 

hi networks supporting the World Wide Web. aod parUcularly "-^^ "^tT" ^"i^"' ^/^u'^^^' 

l'^ the limiutions of rLi users' connections lo such networks. ^^V^^' 2.1994 pendiijg and Shahraray B.. and Cbbon D. 

it takes a long time lo transmit a program with its full Automauc Generatu)n of Pictorial Transcnpts of Vjdeo 

1=^ content. For example, at a connection sp«d of 28.800 bits « ^&^f^-^ Mu^'^^^^^ 1995. 

per second, it could take up to about 45 minutes to transmit ^"^^ ^PIE 2417. Feb. 1995. the latter reference being 

: » even a three or four minute audiovisual segment with sound ^''^^ incorporated by reference. In accordance with this 

U and full-motion video. As a result, video program providers ^ ^ * "'t" P?*"" " comprised by selecung 

I n sometimes form a compressed version of Oie video program ^^f" °^ ^^^^^ 

ii I by manuaUy extracting and retaining selected frames from « ^eP"*'"^'^" &an,es. For example, a single frame may be 

' Z the program while other frames are discarded. TTie selected '° ^'P'^f mformaiion contamed m any 

in frames Ind accompanying text, typically taken from a tran- 8"^" f '^J"^^ P^'Sram. A scene may be defined 

n script of the program, restlt in a document that may subse- * ^S^ent of the video program over which the visual 

■ T quenUy be made avaiUble over the Wbrld Wide Web. «n>"te do not change sigmficantly. TTius. a frame selected 

t~ u r J . • . • 11 45 from the scene may be used lo represent the entire scene 

However, the generation of this document is typically a *^ . i - u * n i r • r 

... J.' . 1 • . i_ . J 1. without losme a substantially large amount of information, 

tedious and lime consummg task smcc it must be created by A f t. • r r » 

, ^ A series of such representative frames from all the scenes in 

a manual process. . . , 

the video program provides a reasonably accurate rcprcscn- 

Accordmgly, it would be advantageous to provide a ution of the entire video program with an acceptable degree 

rendmon of a video program which can be automatically infonnation loss. These compression methods in effect 

generated and which allows easy interaction with graphical ^^^^^ ^ content-based sampling of the video program, 

browsers with a minimum of informaUon loss. AddiUonal information may be found in B. Shahraray, 

ciiKrfiuiADv Tuc iKTv/cKmnM "Scenc Change Detection and Content-Based Sampling of 

SUMMARY OF THE INVENTION ^j^^ Sequences," Digual Video Compression: Algontlinis 

The present inventors have realized thai a pictorial uan- 55 ^ Technologies 1995, SPIE 2419. 

script representation of a video program is particularly well In the previously cited documents, a plurality of repre- 

suited for on-line searching and retrieving applications such sentative frames are selected by sampling the video program 

as browsing on the Worldwide Web. Piaorial transcripts are in a content*based manner to retain a single representative 

compact representations of video programs which are auto- frame from each scene. While the series of frames selected 

matically generated by seleaing representative frames or 60 in this manner may not conuin all the visual information in 

images from the video program and combining them with a the original video program, when combined with another 

second media component such as audio or text which is medium that was a part of the original video program, such 

associated with each representative frame. Properly chosen, as audio or closed-captioned text, the resulting multimedia 

the representative frames convey a substantial portion of the program adequately conveys the information content of the 

information content of the original video program. 65 video program in a condensed formaL To generate this 

Moreover, pictorial transcripts may be generated in an condensed multimedia program, a correspondence must be 

automatic fashion, thus eliminating the substantial time and formed between the representative frames and the audio or 



6,0! 

3 

textual medium. For example, each repress Dtative frame 
should be associated with the portion of the audio or textual 
medium corresponding to the entire scene from which the 
represcniaiivc frame was selected. This correspondence may 
be accomplished in a relatively simple manner because in 
the original video program the video medium is already 
synchronized with the audio or textual information. Addi- 
tional details concerning the formulation of this correspon- 
dence may be found in the previously cited references. 

The representative frames, the audio or textual compo- 
nents associated therewith, and the correspondence between 
the representative frames and the audio or textual compo- 
nents comprise electronic data representing a condensed 
version of a video program, which hereinafter will be 
referred to as the condensed electronic data. 

In the case of closed-captioned text, a printed rendition of 
the condensed electronic data may be provided. The printed 
rendition constimtes a so-called pictorial transcript in which 
each representative frame is printed with a caption contain- 
ing the ponion of the closed-caption text corresponding to 
the scene from which that representative frame is taken. 
FIG. 1 is an example of one page of printed pictorial 
transcript generated from a television news program. 
Alternatively, rather than printing the condensed electronic 
data as a pictorial transcript, the data simply may be elec- 
tronically stored for subsequent retrieval. Thereafter the data 
may be printed, displayed on a computer, or transmitted in 
any desired format. 

In addition, the condensed electronic data may be gener- 
alized further to refer to the series of representative frames 
and the audio segments corresponding thereto rather than 
closed<aption segments. In this case the condensed elec- 
tronic data may be conveniently stored electronically and 
then displayed by sequentially displaying the representative 
frames and, simultaneous with each displayed frame, play- 
ing the corresponding audio segment 

In accordance with the present invention, electronic data 
representing a condensed version of a video program is 
formatted in hypertext markup language (HTN4L) so that the 
rcsiUling HTML document is compatible with the World 
Wide Web. HTML documents refer to on-line documents 
having words or graphics that contain links to other on-line 
documents. Such documents are commonly referred to as 
hypertext documents. By selecting the link (using a mouse 
or key command) the user is connected to another document 
that may be located on the same or a different computer. It 
should be noted that while the present invention is described 
in terms of an on-line document formatted in HTML, more 
generally the present invention is applicable to hypertext 
documents formatted in languages other than HTML, such 
as hypercard, for example. 

An HTML document -is automatically produced from the 
condensed electronic data by an HTML generator, which 
converts the data into an HTML document. Procedures to 
implement such a generator are well known. As used 
hereinafter, the terms ITTML document and HTML pictorial 
• transcript refer to the condensed electronic data that is 
formatted in HTML. The HTIviL document or pictorial 
transcript may be composed of individual records connected 
by links. The individual records of the HTML document or 
pictorial transcript are referred to as pages. 

The HTML pictorial transcript may be advantageously 
divided over two or more HTML pages, depending on the 
size of the document. An HTML, document consisting of 
only a single HTML page is impractical for all but the 
shortest programs (e.g., less than ten minutes in length) 
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because WWW browsers, which sometimes lack parallel 
loading capability, begin to exhibit unacceptable delays. In 
fact, even browsers having parallel loading capability such 
as Netscape will often be taxed. The size of each HTML 

5 page may be determined in any appropriate manner. For 
example, the HTML generator may begin a new page after 
a predetermined number of images (e.g., 25) have been 
placed on a single page. Alternatively, the pages may be 
divided on the basis of story and topic based segmentation. 
The various pages comprising the HTLM document may be 
connected by hypertext links. 

A graphical browser is a graphical interface that can 
access documents on the WWW in an HTML format. The 
HTML pictorial transcript may be conveniently accessed 
and searched ixsing conventional graphical browsers such as 
Mosaic, Spry and Explorer, for example. 

The HTML pictorial transcript may be displayed in a 
variety of different formats. The user may have the option of 
selecting among several predetermined formats, or 
altcraatively, the user may customize a formal via the web 

20 browser. The server, in turn, rc-cxecutes the HTML genera- 
tor routine, which now produces the HTML document in the 
desired formal. Additionally, if no selection is made, the 
HTML transcript may be displayed in a default formal 
(which may be one of the standard formats). In some 

2S embodiments of the invention, the user may be provided 
with a plurality of different defauh formats from which to 
choose. 

In one embodiment of the invention, a standard or default 
format displays an HTML pictorial transcript that is the 

30 equivalent of the printed rendition of a pictorial transcript 
such as shown in FIG. 1. Other formats may modify this 
particular format to reduce retrieval time and improve page 
layout. For example, some fonmats may be employed to 
reduce the required bandwidth by displaying only a subset of 

35 the representative frames contained in the HTML pictorial 
transcript. Many different criteria may be employed lo 
determine which representative frames to retain and which 
to omit. 

One criterion that may used to eliminate select represen- 

40 tative frames is based on the presence of redundant frames. 
For example, if the original program contains a shot of a 
given scene at one time and subsequently contains substan- 
tially the same scene after one or more other scenes have 
intervened, the resulting pictorial transcript will contain two 

45 representative frames that are substantially the same. 
Accordingly, one of the redundant representative frames 
may be eliminated to reduce bandwidth. In the resulting 
HTML pictorial transcript it may be desirable to use a 
hypertext link in place of the second appearance of the 

50 redundant representative frame which links back to the first 
appearance of the representative frame. 

Other criteria that may used to eliminate select represen- 
tative frames are based on random subsampling (e.g., retain 
every other representative frame) or, aliematively, the size of 

55 the JPEG image file. For example, it may be desirable to 
retain only the largest of the image files on the assumption 
that image size is correlated with the complexity of the 
image. More complex images typically convey more infor- 
mation. Conversely, it may be desirable to retain only the 

60 smallest of the image files to further minimize bandwidth 
requirements. Alternatively, it may be advantageous to retain 
only rcpresenutive images that differ from one another by 
more than a prescribed amount, as determined by scene 
matching techniques. The representative images that are 

65 eliminated in this manner may be replaced by hypertext 
anchors linked to the similar representative images that were 
retained. 



